My Report

25Winter

data: books.csv

Author

Cathy

Published

July 1, 2025

Books dataset with Quarto

Investigating the books dataset within quarto

Loading the required packages

library(ggplot2)
library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Loading the packages

books <-  read.csv("../../../../data/books.csv")

The books dataset includes the following fields

names(books)

 [1] "X"                  "bookID"             "title"             
 [4] "authors"            "average_rating"     "isbn"              
 [7] "isbn13"             "language_code"      "num_pages"         
[10] "ratings_count"      "text_reviews_count" "publication_date"  
[13] "publisher"

This graph shows the the volume of publications by language code

books |> 
  ggplot(mapping = aes(x = language_code)) +
  geom_bar()

Due to the large number of english language books, We will create a data subset without the codes ‘eng’ and ‘en-US’.

non_english_books <- filter(books,!language_code %in% c("en-US","eng"))
non_english_books |> 
  ggplot(mapping = aes(x = language_code)) +
  geom_bar()

#| fig-cap: "Number of Non-English Books Published"

ggplot(data = non_english_books,
       mapping = aes(x = num_pages,
                     y = publisher)) +
  geom_point()